Cluster Characterization through a Representativity Measure

نویسندگان

  • Marie-Jeanne Lesot
  • Bernadette Bouchon-Meunier
چکیده

Clustering is an unsupervised learning task which provides a decomposition of a dataset into subgroups that summarize the initial base and give information about its structure. We propose to enrich this result by a numerical coefficient that describes the cluster representativity and indicates the extent to which they are characteristic of the whole dataset. It is defined for a specific clustering algorithm, called Outlier Preserving Clustering Algorithm, opca, which detects clusters associated with major trends but also with marginal behaviors, in order to offer a complete description of the inital dataset. The proposed representativity measure exploits the iterative process of opca to compute the typicality of each identified cluster.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Field validation of secondary data sources: a novel measure of representativity applied to a Canadian food outlet database

BACKGROUND Validation studies of secondary datasets used to characterize neighborhood food businesses generally evaluate how accurately the database represents the true situation on the ground. Depending on the research objectives, the characterization of the business environment may tolerate some inaccuracies (e.g. minor imprecisions in location or errors in business names). Furthermore, if th...

متن کامل

A Multi-Word Term Extraction Program for Arabic Language

Terminology extraction commonly includes two steps: identification of term-like units in the texts, mostly multi-word phrases, and the ranking of the extracted term-like units according to their domain representativity. In this paper, we design a multi-word term extraction program for Arabic language. The linguistic filtering performs a morphosyntactic analysis and takes into account several ty...

متن کامل

Objective criteria to assess representativity of soil fungal community profiles.

Soil fungal community structures are often highly heterogeneous even among samples taken from small field plots. Sample pooling is widely used in order to overcome this heterogeneity, however, no objective criteria have yet been defined on how to determine the number of samples to be pooled for representatively profiling a field plot. In the present study PCR/RFLP and T-RFLP analysis of fungal ...

متن کامل

Optimizing Spatial Declustering Weights – Comparison of Methods

Analysis of a spatial phenomenon is to a great extent affected by the frequent irregular structures and/or the preferential clustering of the sampling schemes. To obtain representative statistics for an area of interest, the influence of clustered measurements needs to be reduced by attributing them lower weights. In this case study, two standard methods, the polygonal and the cell-declustering...

متن کامل

New Developments in Representativity Approach to study Advanced Assembly Concepts in the EOLE Critical Facility

A new representativity approach based on sensitivity analysis of integral parameters to nuclear data, in the field of Advanced Assemblies Concepts (AAC) design is developped. The adopted scheme proposes an original approach to the problem, going from the initial « microscopic » pin-cells integral parameters to the whole « macroscopic » assembly integral parameters.. The originality of the prese...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004